Programação de Processadores em Massa Paralela: Uma Abordagem Prática: Além de Arranjos Lineares: Escalando para Dados Multidimensionais

Bem-vindo(a) ao A Grande Transferência. Na programação de CPU, definimos como iterar; no GPGPU, definimos o que uma iteração parece ser. Esse deslocamento da lógica centrada em instruções para a lógica centrada em dados é impulsionado pela Abstração de Kernel.

1. O Projeto global

Ao usar o __global__ qualificador, você não está escrevendo uma função—está projetando um projeto escalável. A execução individual de um kernel representa uma unidade independente de trabalho, permitindo que a GPU orchestre milhares de tarefas idênticas em seu grande número de núcleos sem gerenciamento manual de threads.

2. O Resolvedor de Endereço Global

Como uma única thread entre milhões encontra seu alvo? Ela utiliza um contrato determinístico conhecido como fórmula de indexação:

$$\text{threadID} = \text{blockIdx.x} \times \text{blockDim.x} + \text{threadIdx.x}$$

Essa fórmula atua como um sistema de coordenadas, conectando os dados lógicos do software (o arranjo) à hierarquia física do hardware (blocos e threads).

3. Configuração de Execução

O <<<B, T>>> parâmetros definem a forma da grade. Isso garante Escalabilidade Transparente: seu código executa a mesma lógica independentemente de o hardware ter 2 SMs ou 80 SMs.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary role of the __global__ qualifier?

To define a function that runs on the CPU and is called by the GPU.

To mark a function as a kernel that is callable from the host and executes on the device.

To synchronize all threads across the entire GPU grid.

To allocate memory in the global memory space.

QUESTION 2

If blockIdx.x = 2, blockDim.x = 256, and threadIdx.x = 10, what is the global index?

266

512

522

778

QUESTION 3

What does 'Transparent Scalability' imply in CUDA?

The memory automatically scales with the size of the input array.

The same code can run on different GPUs with varying SM counts without modification.

Threads can see into the registers of other threads.

The kernel speed increases linearly with the clock speed of the CPU.

QUESTION 4

Why is the if (i < n) check necessary in a kernel?

To prevent the GPU from overheating.

To ensure threads do not access memory outside the valid array bounds.

To check if the kernel is running on the correct SM.

To synchronize memory access between threads.

QUESTION 5

Which variable represents the number of threads within a single block?

gridDim.x

blockIdx.x

blockDim.x

threadIdx.x

1. O Projeto __global__

2. O Resolvedor de Endereço Global

3. Configuração de Execução

1. O Projeto global